Skip to content

Support RAID configuration for baremetal server#292

Merged
metal3-io-bot merged 3 commits into
metal3-io:masterfrom
longkb:support_raid_configuration
Mar 8, 2021
Merged

Support RAID configuration for baremetal server#292
metal3-io-bot merged 3 commits into
metal3-io:masterfrom
longkb:support_raid_configuration

Conversation

@longkb
Copy link
Copy Markdown
Contributor

@longkb longkb commented Aug 28, 2019

Support RAID configuration for baremetal server

Currently, Metal3 does not support deploy baremetal server with RAID configuration (#206).
This PR aims to extend Ironic Provisioner to support RAID configuration with the bellows:

  • Extend BaremetalHost CRD to support RAID configuration
  • Generate Ironic's cleansteps from raid property in host's spec.
  • Trigger manual cleaning to config raid in provisioning phase.

@longkb longkb changed the title Support raid configuration Support RAID configuration for baremetal server Aug 28, 2019
@longkb longkb force-pushed the support_raid_configuration branch from 043f835 to 48ef146 Compare August 28, 2019 13:49
@nordixinfra
Copy link
Copy Markdown

Can one of the admins verify this patch?

Copy link
Copy Markdown
Member

@dhellmann dhellmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This version is looking pretty good.

I think we can improve the validation by using kubebuilder features. I have tried to give an example in an inline comment.

I would like for us to add the smallest possible API for managing RAID. It is easier to add to the API than to remove from it once it is being used. If there are reasonable default behaviors, we should leave options out, for now. If there are advanced features, we may never want to include them.

I think the code to start the manual cleaning process needs to be run in more places. I have tried to explain in more detail inline.

Comment thread pkg/apis/metal3/v1alpha1/baremetalhost_types.go Outdated
Comment thread pkg/apis/metal3/v1alpha1/baremetalhost_types.go Outdated
Comment thread pkg/apis/metal3/v1alpha1/baremetalhost_types.go Outdated

// Integer, number of disks to use for the logical disk. Defaults to minimum number of disks required
// for the particular RAID level.
NumberOfPhysicalDisks int `json:"numberOfPhysicalDisks,omitempty"`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would you ever set this to anything other than the default for a given RAID mode?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never try to set this before.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, you can have non-default number of disks, especially for RAID-0.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the validation instruction work with this field? If the user doesn't specify a value, will we get 0? But if they do specify a value then the validation ensures that it is > 0?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opps, I missed that point.
In the case of user didn't specify a value, we will pass the validation of kubebuilder and got 0.
And if they specify a value, it will be validate by kubebuilder in line 129 // +kubebuilder:validation:Minimum=1
So I think pointer would be suitable in the case of missing this field
Thanks

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unresolved this because it appears we decided on a pointer here but the field didn't actually change.

Also for software RAID we're supposed to be able to supply device hints in the same format as the root device hints we added in #495, and the DiskType and NumberOfPhysicalDisks do not apply. If we're going to keep the software RAID thing I wonder if we should separate the software and hardware RAID configurations into separate structures.

Copy link
Copy Markdown
Member

@zouy414 zouy414 Jun 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this:

// RAIDConfig contains the configuration that are required to config RAID in Bare Metal server
type RAIDConfig struct {
	// If true will remove existing raid configuration and set new raid configuration.
	DeleteExisting bool `json:"deleteExisting,omitempty"`

	// The list of logical disks for hardware RAID, first volume is root volume
	HVolumes []HRAIDVolume `json:"hVolumes,omitempty"`

	// The list of logical disks for software RAID, first volume is root volume, if HVolume is setted this items will be invalid.
	// The number of created Software RAID devices must be 1 or 2.
	// If there is only one Software RAID device, it has to be a RAID-1.
	// If there are two, the first one has to be a RAID-1, while the RAID level for the second one can 0, 1, or 1+0.
	// As the first RAID device will be the deployment device,
	// enforcing a RAID-1 reduces the risk of ending up with a non-booting node in case of a disk failure.
	SVolumes []SRAIDVolume `json:"sVolumes,omitempty"`
}

Comment thread pkg/apis/metal3/v1alpha1/baremetalhost_types.go Outdated
Comment thread pkg/provisioner/fixture/fixture.go Outdated
Comment thread pkg/provisioner/ironic/ironic.go Outdated
Comment thread pkg/provisioner/ironic/ironic.go Outdated
Comment thread pkg/provisioner/ironic/ironic.go Outdated
Comment thread pkg/provisioner/ironic/ironic.go Outdated
Comment thread deploy/crds/metal3_v1alpha1_baremetalhost_crd.yaml Outdated
Comment thread pkg/apis/metal3/v1alpha1/baremetalhost_types.go Outdated
Comment thread pkg/provisioner/ironic/ironic.go Outdated
Comment thread pkg/provisioner/ironic/ironic.go Outdated
Comment thread pkg/apis/metal3/v1alpha1/baremetalhost_types.go Outdated
Comment thread pkg/provisioner/ironic/ironic.go Outdated
Comment thread pkg/provisioner/ironic/ironic.go Outdated
@longkb longkb force-pushed the support_raid_configuration branch from 48ef146 to e41d52a Compare September 4, 2019 11:11
@longkb
Copy link
Copy Markdown
Contributor Author

longkb commented Sep 4, 2019

@dhellmann , @dtantsur : Thank for your comments
Sorry for late reply. I just push the revision to address your comments :)

@longkb longkb force-pushed the support_raid_configuration branch 2 times, most recently from 3ab9a5e to d7a3d5a Compare September 9, 2019 10:15
Comment thread pkg/provisioner/ironic/ironic.go Outdated
Comment thread pkg/apis/metal3/v1alpha1/baremetalhost_types.go Outdated
@longkb longkb force-pushed the support_raid_configuration branch from d7a3d5a to 6f19d1c Compare September 11, 2019 10:04
Comment thread pkg/apis/metal3/v1alpha1/baremetalhost_types.go Outdated
Comment thread pkg/apis/metal3/v1alpha1/baremetalhost_types.go Outdated
Comment thread pkg/apis/metal3/v1alpha1/baremetalhost_types.go Outdated

// Integer, number of disks to use for the logical disk. Defaults to minimum number of disks required
// for the particular RAID level.
NumberOfPhysicalDisks int `json:"numberOfPhysicalDisks,omitempty"`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

Comment thread pkg/provisioner/ironic/ironic.go Outdated
Comment thread pkg/provisioner/ironic/ironic.go Outdated
Comment thread pkg/provisioner/ironic/ironic.go Outdated
@longkb longkb force-pushed the support_raid_configuration branch from 6f19d1c to abd2434 Compare September 26, 2019 02:03
Copy link
Copy Markdown
Contributor Author

@longkb longkb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dhellmann Thanks for you reviewing. I have pushed some revision in this PR :)

Comment thread pkg/apis/metal3/v1alpha1/baremetalhost_types.go Outdated
Comment thread pkg/apis/metal3/v1alpha1/baremetalhost_types.go Outdated
Comment thread pkg/apis/metal3/v1alpha1/baremetalhost_types.go Outdated
Comment thread pkg/provisioner/ironic/ironic.go Outdated
Comment thread pkg/provisioner/ironic/ironic.go Outdated
Comment thread pkg/provisioner/ironic/ironic.go Outdated
Comment thread pkg/provisioner/ironic/ironic.go Outdated
@longkb longkb force-pushed the support_raid_configuration branch from abd2434 to 2368edb Compare October 4, 2019 02:26
@metal3-io-bot metal3-io-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 4, 2019
Comment thread pkg/apis/metal3/v1alpha1/baremetalhost_types.go
Comment thread pkg/apis/metal3/v1alpha1/baremetalhost_types.go Outdated
@longkb longkb force-pushed the support_raid_configuration branch from f685941 to d592398 Compare January 9, 2020 01:07

// The valid values must be "hdd" or "ssd". If this is not specified, disk type will not be a criterion to find backing physical disks
// +kubebuilder:validation:Enum=hdd,ssd
DiskType nodes.DiskType `json:"diskType,omitempty"`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this field is optional, don't we need for it to be a pointer? Or does DiskType have a good zero value?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although Ironic will handle the zero value for DiskType, but I think it would be better if we make it to be a pointer
Thanks

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still needs to be addressed. What is the zero value for nodes.DiskType? What default do we want for this field?

Copy link
Copy Markdown
Member

@zouy414 zouy414 Apr 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea, but in OpenStack Docs, a part of example haven't set disk_type, and i find some explanation.

disk_type - hdd or ssd. If this is not specified, disk type will not be a criterion to find backing physical disks.

Copy link
Copy Markdown
Member

@zouy414 zouy414 Apr 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we can check DiskType, if DiskType is empty return error.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just a string so the zero value is an empty string 🤷‍♂️

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably do have to specifically allow an empty value in the kubebuilder validation of the previous line though I think?

Everywhere else in the API (HardwareDetails and root device hints) we have had a bool named Rotational instead of the strings "hdd" and "sdd". Let's do that here also and make this a bool pointer.


// Integer, number of disks to use for the logical disk. Defaults to minimum number of disks required
// for the particular RAID level.
NumberOfPhysicalDisks int `json:"numberOfPhysicalDisks,omitempty"`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the validation instruction work with this field? If the user doesn't specify a value, will we get 0? But if they do specify a value then the validation ensures that it is > 0?

@longkb longkb force-pushed the support_raid_configuration branch from d592398 to bd43382 Compare January 10, 2020 07:39
Copy link
Copy Markdown
Member

@dtantsur dtantsur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I can lgtm here, but looks good

}

// HardwareRAIDVolume defines the desired configuration of volume in hardware RAID
type HardwareRAIDVolume struct {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we should probably move this to GopherCloud eventually

SizeGibibytes *int `json:"sizeGibibytes,omitempty"`

// RAID level for the logical disk. The following levels are supported: 0;1;1+0.
// +kubebuilder:validation:Enum="0";"1";"1+0"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep (I'm fine with a follow-up)

SizeGibibytes *int `json:"sizeGibibytes,omitempty"`

// RAID level for the logical disk. The following levels are supported: 0;1;1+0.
// +kubebuilder:validation:Enum="0";"1";"1+0"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's probably open another patch and discuss there.

host.Status.Provisioning.RAID = &metal3v1alpha1.RAIDConfig{}
dirty = true
}
// If HardwareRAIDVolumes isn't nil, we will ignore SoftwareRAIDVolumes.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's research this option in a follow-up.

Comment thread pkg/provisioner/ironic/raid.go Outdated
}

// A private method to build RAID disks
func buildTargetRAIDCfg(raid *metal3v1alpha1.RAIDConfig) (logicalDisks []nodes.LogicalDisk, err error) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe make this public for reuse? e.g. we use these bits in openshift installer (if you care about it).

Comment thread pkg/provisioner/ironic/raid.go Outdated
}

// buildRAIDCleanSteps build the clean steps for RAID configuration from BaremetalHost spec
func buildRAIDCleanSteps(raid *metal3v1alpha1.RAIDConfig) (cleanSteps []nodes.CleanStep) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same re public

@dtantsur
Copy link
Copy Markdown
Member

/test-integration

@dtantsur
Copy link
Copy Markdown
Member

/hold cancel

Signed-off-by: zouyu <zouy.fnst@cn.fujitsu.com>
@demonCoder95
Copy link
Copy Markdown
Member

/test-integration

@zouy414
Copy link
Copy Markdown
Member

zouy414 commented Mar 2, 2021

@andfasano @dtantsur @zaneb PTAL.

Comment thread pkg/provisioner/ironic/ironic.go Outdated
if p.bmcAccess.RAIDInterface() != "" {
cleanSteps = append(cleanSteps, BuildRAIDCleanSteps(p.host.Status.Provisioning.RAID)...)
} else if p.host.Status.Provisioning.RAID != nil {
return nil, fmt.Errorf("RAID settings are exist, but the node's driver %s does not support RAID", p.bmcAccess.Driver())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: I guess it's RAID settings are defined

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review, done.

Comment thread pkg/provisioner/ironic/ironic.go Outdated
// Build manual clean steps
cleanSteps, err := p.buildManualCleaningSteps()
if err != nil {
result, err = transientError(err)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A transientError produces an immediate re-enqueue of the reconcile loop. IIUC since in this case the error generated by p.buildManualCleaningSteps() is synchronous (and could maybe require a human intervention to be fixed), I'd suggest to use operationFailed() since it will activate the retry mechanism with backoff

Comment thread pkg/provisioner/ironic/ironic.go Outdated
var cleanSteps []nodes.CleanStep
cleanSteps, err = p.buildManualCleaningSteps()
if err != nil {
result, err = transientError(err)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above

},
},
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's really a fragile approach, and moreover hides the setup in the cases list. In other situation we've used the builder pattern to allow each case to specify its own settings (as for the testserver), here's not yet available for the host, but at least an enrichment function can be added in the test struct, to be invoked on the host only when defined

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I added a flag existRaidConfig to test cases.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good step in the right direction (at least more robust than before), even though the single test case readability is still not optimal (a builder approach would be definitely better); not anymore a blocking point for me though

zouy414 added 2 commits March 3, 2021 09:21
Signed-off-by: zouyu <zouy.fnst@cn.fujitsu.com>
Signed-off-by: zouyu <zouy.fnst@cn.fujitsu.com>
},
},
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good step in the right direction (at least more robust than before), even though the single test case readability is still not optimal (a builder approach would be definitely better); not anymore a blocking point for me though

@metal3-io-bot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andfasano, dtantsur, longkb, zaneb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@zouy414
Copy link
Copy Markdown
Member

zouy414 commented Mar 4, 2021

@dhellmann @zaneb @andfasano and @dtantsur have approved this PR, is there anything else block the PR from being merged?

@demonCoder95
Copy link
Copy Markdown
Member

/test-integration

@demonCoder95
Copy link
Copy Markdown
Member

@andfasano @dtantsur I think this needs an LGTM to proceed.

@dtantsur
Copy link
Copy Markdown
Member

dtantsur commented Mar 8, 2021

/lgtm

I don't think I actually have LGTM rights on this repository, but let's try.

@metal3-io-bot
Copy link
Copy Markdown
Contributor

@dtantsur: adding LGTM is restricted to approvers and reviewers in OWNERS files.

Details

In response to this:

/lgtm

I don't think I actually have LGTM rights on this repository, but let's try.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

maxLength: 64
type: string
numberOfPhysicalDisks:
description: Integer, number of disks to use for the logical disk. Defaults to minimum number of disks required for the particular RAID level.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
description: Integer, number of disks to use for the logical disk. Defaults to minimum number of disks required for the particular RAID level.
description: Integer, number of physical disks to use for the logical disk. Defaults to minimum number of disks required for the particular RAID level.

- 1+0
type: string
physicalDisks:
description: A list of device hints, the number of item should be greater than or equal to 2.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
description: A list of device hints, the number of item should be greater than or equal to 2.
description: A list of device hints, the number of items should be greater than or equal to 2.

Comment thread config/render/capm3.yaml
- 1+0
type: string
physicalDisks:
description: A list of device hints, the number of item should be greater than or equal to 2.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
description: A list of device hints, the number of item should be greater than or equal to 2.
description: A list of device hints, the number of items should be greater than or equal to 2.

Comment thread config/render/capm3.yaml
- 1+0
type: string
physicalDisks:
description: A list of device hints, the number of item should be greater than or equal to 2.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
description: A list of device hints, the number of item should be greater than or equal to 2.
description: A list of device hints, the number of items should be greater than or equal to 2.

- 1+0
type: string
physicalDisks:
description: A list of device hints, the number of item should be greater than or equal to 2.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
description: A list of device hints, the number of item should be greater than or equal to 2.
description: A list of device hints, the number of items should be greater than or equal to 2.

maxLength: 64
type: string
numberOfPhysicalDisks:
description: Integer, number of disks to use for the logical disk. Defaults to minimum number of disks required for the particular RAID level.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
description: Integer, number of disks to use for the logical disk. Defaults to minimum number of disks required for the particular RAID level.
description: Integer, number of physical disks to use for the logical disk. Defaults to minimum number of disks required for the particular RAID level.

Comment thread config/render/capm3.yaml
maxLength: 64
type: string
numberOfPhysicalDisks:
description: Integer, number of disks to use for the logical disk. Defaults to minimum number of disks required for the particular RAID level.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
description: Integer, number of disks to use for the logical disk. Defaults to minimum number of disks required for the particular RAID level.
description: Integer, number of physical disks to use for the logical disk. Defaults to minimum number of disks required for the particular RAID level.

Comment thread config/render/capm3.yaml
maxLength: 64
type: string
numberOfPhysicalDisks:
description: Integer, number of disks to use for the logical disk. Defaults to minimum number of disks required for the particular RAID level.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
description: Integer, number of disks to use for the logical disk. Defaults to minimum number of disks required for the particular RAID level.
description: Integer, number of physical disks to use for the logical disk. Defaults to minimum number of disks required for the particular RAID level.

@andfasano
Copy link
Copy Markdown
Member

/lgtm

@demonCoder95 @Hellcatlk there are still a couple of points pending, but they could be addressed in a followup:

@zouy414 zouy414 mentioned this pull request Mar 9, 2021
@zouy414
Copy link
Copy Markdown
Member

zouy414 commented Mar 9, 2021

@andfasano I submitted a PR to resolve the comments left by @rdoxenham.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.